Hypothesis Testing and Theory Evaluation

PSCI 3300.003 Political Science Research Methods

A. Jordan Nafa

University of North Texas

November 1st, 2022

Overview

  • Introduction to approaches to hypothesis testing and theory evaluation

    • History of hypothesis testing in inferential statistics

    • Perils and pitfalls of the classical approach

  • Introduction to Bayesian Inference

    • Priors, Posteriors, and Bayes Theorem

Definitions

  • A classical p-value is

    • the probability of observing a result at least as extreme under the assumption the null hypothesis \(\mathrm{H_{0}}\) is true
  • A classical p-value is not

    • the probability an effect exists

    • the probability the null hypothesis is true

    • anything else that isn’t the definition provided above

Error Probabilities

Accept H0 Reject H0
H0 is True Correct Decision (\(1-\alpha\)) Type I Error (\(\alpha\))
H0 is False Type II Error (\(\beta\)) Correct Decision (\(1-\beta\))
  • Type I Error

    • The probability of rejecting a null hypothesis \(\mathrm{H_{0}}\) when it is in fact true
  • Type II Error

    • The probability of failing to reject a null hypothesis \(\mathrm{H_{0}}\) when it is in fact false
  • \(\alpha\) is the “significance level” and is usually fixed at .05 in practice because Fisher said so once

Fisher’s Test of Significance

Fisher (1925, 1925, 1955) proposed a test of significance to assess whether an observed result is unlikely to arise due purely to random chance, which can be summarized as follows

  1. Identify the null hypothesis \(\mathrm{H_{0}}\)

  2. Determine the appropriate test statistic \(T\) and its distribution under the assumption \(\mathrm{H_{0}}\) is true

  3. Estimate the test statistic \(t\) from the observed data

  4. Determine the achieved significance level that corresponds to \(t\) under the assumption \(\mathrm{H_{0}}\) is true

  5. Reject \(\mathrm{H_{0}}\) if the achieved significance level is below an arbitrary threshold; otherwise reach no conclusion

Neyman-Pearson Hypothesis Testing

In a direct challenge to Fisher’s proposed test, Neyman and Pearson (1933, 1933) proposed a rigid decision theoretic framework for hypothesis testing, which can be summarized as follows

  1. Identify a hypothesis of interest, \(\mathrm{H_{a}}\), and its complement hypothesis, \(\mathrm{H_{0}}\).

  2. Determine the appropriate test statistic \(T\) and its distribution under the assumption that \(\mathrm{H_{0}}\) is true.

  3. Define a significance level \(\alpha\), and determine the corresponding critical value \(t^{*}\) of the test statistic assuming that \(\mathrm{H_{0}}\) is true

  4. Estimate the test statistic \(t\) from the data

  5. Reject \(\mathrm{H_{0}}\) and accept \(\mathrm{H_{a}}\) if the test statistic \(t\) is further than \(t^{*}\) from the expected value of the test statistic under the assumption \(\mathrm{H_{0}}\) is true. Otherwise, accept \(\mathrm{H_{0}}\).

Neyman-Pearson Hypothesis Testing

  • The Neyman-Pearson approach is decision theoretic in that if strictly followed, it represents a cost function that attempts to minimize the long-run type I error rate, or the chance of making an incorrect decision.

  • For conclusions to be valid, \(\alpha\) and most aspects of analysis must be fixed prior to data collection and multiple-comparisons corrections may be required depending on the analysis.

  • The probability of rejecting \(\mathrm{H_{0}}\) when it is in fact true is called power and is defined as \(1-\beta\) where \(\beta\) represents the probability of a type II error

  • Approach can only be used to facilitate a dichotomous decision, either we reject \(\mathrm{H_{0}}\) in favor of \(\mathrm{H_{0}}\) or we fail to reject \(\mathrm{H_{0}}\)

  • In passing, note that if the goal is to decide between two competing hypotheses, NP testing tends to perform poorly (Christensen 2005).

Null Hypothesis Significance Testing

  • Null Hypothesis Significance Testing (NHST) is an unholy hybrid of the Neyman-Pearson hypothesis test and Fisher’s test of significance.

    • Decision-element of rejecting a null hypothesis in favor of an alternative taken from the NP framework and concept of “statistical significance” taken from the Fisherian test.

    • Reformulates modus tollens or proof by contradiction as a probabilistic axiom which tends to fail spectacularly in practice.

NHST and Proof by Contradiction

Proof by contradiction, a form of valid deductive logical reasoning, can be expressed as follows

If A then B

B not observed

Therefore not A

If \(\mathrm{H_{0}}\) is true then the data will follow an expected pattern

The data do not follow the expected pattern

Therefore \(\mathrm{H_{0}}\) is false

NHST and Proof by Contradiction

Proof by contradiction, a form of valid deductive logical reasoning, can be expressed as follows

If A then B

B not observed

Therefore not A

If \(\mathrm{H_{0}}\) is true then the data will follow an expected pattern

The data do not follow the expected pattern

Therefore \(\mathrm{H_{0}}\) is false

NHST reformulates these deterministic logical statements as probabilistic assertions which renders them invalid.

If A then B is highly likely

B not observed

Therefore A is highly unlikely

If a person is an American then it is highly unlikely she is a member of Congress

The person is a member of Congress

Therefore it is highly unlikely she is an American.

Comparison of Approaches

Figure 1. Comparison of Frequentist Approaches to Hypothesis Testing

Statistical Significance

  • When a p-value is less than \(\alpha\) it often referred to as “statistically significant”

  • In practice, all this means is that someone is telling you they are surprised by a result

    • It doesn’t mean that anything important has been discovered

    • It doesn’t say anything about the size of an effect or its substantive significance

    • It’s meaning is entirely context-specific and is difficult to assess without domain knowledge

  • Since p-values are mostly just a crude proxy for sample size, as \(n \longrightarrow \infty\) most things are “statistically significant”

Where We’re Headed

  • Introduction to Bayesian Inference

    • Priors, Posteriors, and Bayes Theorem

    • Estimation and Uncertainty

References

Christensen, Ronald. 2005. Testing Fisher, Neyman, Pearson, and Bayes.” The American Statistician 59(2): 121–26.
Fisher, Sir Ronald A. 1925. Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd.
———. 1925. Theory of Statistical Estimation.” Proceedings of the Cambridge Philosophical Society 22: 700–725.
———. 1955. Statistical Methods and Scientific Induction.” Journal of the Royal Statistical Society: Series B (Methodological) 17(1): 69–78.
Neyman, Jerzy, and Egon S. Pearson. 1933. On the Problem of the Most Efficient Test of Statistical Hypotheses.” Philosophical Transactions of the Royal Statistical Society 231(694-706): 289–337.
———. 1933. The Testing of Statistical Hypotheses in Relation to Probabilities a Priori.” Mathematical Proceedings of the Cambridge Philosophical Society 29: 492–510.